Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do
نویسندگان
چکیده
Numerous chemical data sets have become available for quantitative structure-activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the modeling sets may lead to the development of poor QSAR models and further affect the predictions of new compounds. In this study, we explored the relationship between the ratio of questionable data in the modeling sets, which was obtained by simulating experimental errors, and the QSAR modeling performance. To this end, we used eight data sets (four continuous endpoints and four categorical endpoints) that have been extensively curated both in-house and by our collaborators to create over 1800 various QSAR models. Each data set was duplicated to create several new modeling sets with different ratios of simulated experimental errors (i.e., randomizing the activities of part of the compounds) in the modeling process. A fivefold cross-validation process was used to evaluate the modeling performance, which deteriorates when the ratio of experimental errors increases. All of the resulting models were also used to predict external sets of new compounds, which were excluded at the beginning of the modeling process. The modeling results showed that the compounds with relatively large prediction errors in cross-validation processes are likely to be those with simulated experimental errors. However, after removing a certain number of compounds with large prediction errors in the cross-validation process, the external predictions of new compounds did not show improvement. Our conclusion is that the QSAR predictions, especially consensus predictions, can identify compounds with potential experimental errors. But removing those compounds by the cross-validation procedure is not a reasonable means to improve model predictivity due to overfitting.
منابع مشابه
Diagnostic and therapeutic challenges for dermatologists: What shall we do when we don’t know what to do?
What shall we do when we have done everything we could for the diagnosis and treatment of a patient, but were not successful? What shall we do when there is no definite treatment for a patient? What shall we do when we have no diagnosis or treatment for a patient? Some useful suggestions are presented here to get rid of these situations.
متن کاملHigh Stakes Require More Than Just Talk: What to Do About Corruption in Health Systems; Comment on “We Need to Talk About Corruption in Health Systems”
Reluctance to talk about corruption is an important barrier to action. Yet the stakes of not addressing corruption in the health sector are higher than ever. Corruption includes wrongdoing by individuals, but it is also a problem of weak institutions captured by political interests, and underfunded, unreliable administrative systems and healthcare delivery models. We ur...
متن کاملIt Ain’t What You Do (But the Way That You Do It): Will Safety II Transform the Way We Do Patient Safety; Comment on “False Dawns and New Horizons in Patient Safety Research and Practice”
Mannion and Braithwaite outline a new paradigm for studying and improving patient safety – Safety II. In this response, I argue that Safety I should not be dismissed simply because the safety management strategies that are developed and enacted in the name of Safety I are not always true to the original philosophy of ‘systems thinking.’
متن کامل